K-Nearest Neighbor Search by Random Projection Forests
نویسندگان
چکیده
K-nearest neighbor (kNN) search is an important problem in data mining and knowledge discovery. Inspired by the huge success of tree-based methodology ensemble methods over last decades, we propose a new method for kNN search, random projection forests (rpForests). rpForests finds nearest neighbors combining multiple kNN-sensitive trees with each constructed recursively through series projections. As demonstrated experiments on wide collection real datasets, our achieves remarkable accuracy terms fast decaying missing rate kNNs that discrepancy k-th distances. has very low computational complexity as methodology. The nature makes it easily parallelized to run clustered or multicore computers; running time expected be nearly inversely proportional number cores machines. We give theoretical insights showing exponential decay neighboring points being separated when size increases. Our theory can also used refine choice projections growth rpForests; show effect remarkable.
منابع مشابه
Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph
We introduce a new nearest neighbor search algorithm. The algorithm builds a nearest neighbor graph in an offline phase and when queried with a new point, performs hill-climbing starting from a randomly sampled node of the graph. We provide theoretical guarantees for the accuracy and the computational complexity and empirically show the effectiveness of this algorithm.
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملHigh-dimensional approximate nearest neighbor: k-d Generalized Randomized Forests
We propose a new data-structure, the generalized randomized k -d forest, or k -d GeRaF, for approximate nearest neighbor searching in high dimensions. In particular, we introduce new randomization techniques to specify a set of independently constructed trees where search is performed simultaneously, hence increasing accuracy. We omit backtracking, and we optimize distance computations, thus ac...
متن کاملEvaluation Accuracy of Nearest Neighbor Sampling Method in Zagross Forests
Collection of appropriate qualitative and quantitative data is necessary for proper management and planning. Used the suitable inventory methods is necessary and accuracy of sampling methods dependent the inventory net and number of sample point. Nearest neighbor sampling method is a one of distance methods and calculated by three equations (Byth and Riple, 1980; Cotam and Curtis, 1956 and Cota...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Big Data
سال: 2021
ISSN: ['2372-2096', '2332-7790']
DOI: https://doi.org/10.1109/tbdata.2019.2908178